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Abstract 

Recent results have shown that the performance of bit-interleaved coded modulation (BICM) using 
convolutional codes in nonfading channels can be significantly improved when the interleaver takes 
a trivial form (BICM-T), i.e., when it does not interleave the bits at all. In this paper, we give a 
formal explanation for these results and show that BICM-T is in fact the combination of a TCM 
transmitter and a BICM receiver. To predict the performance of BICM-T, a new type of distance spectrum 
for convolutional codes is introduced, analytical bounds based on this spectrum are developed, and 
asymptotic approximations are also presented. It is shown that the minimum distance of the code is 
not the relevant optimization criterion for BICM-T. Optimal convolutional codes for different constrain 
lengths are tabulated and asymptotic gains of about 2 dB are obtained. These gains are found to be 
the same as those obtained by Ungerboeck's one-dimensional trellis coded modulation (1D-TCM), and 
therefore, in nonfading channels, BICM-T is shown to be asymptotically as good as 1D-TCM. 
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I. Introduction 

Coded modulation (CM) was introduced in 1974 when Massey proposed the idea of jointly 
designing the channel encoder and modulator [1]. This inspired Ungerboeck's trellis coded 
modulation (TCM) fl2), and Imai and Hirakawa's multilevel coding 0. Bit-interleaved coded 
modulation (BICM) fl4j — [[6]| appeared in 1992 as an alternative for CM in fading channels. One 
particularly appealing feature of BICM is that all the operations are bit-wise, i.e., off-the-shelf 
binary codes and Gray-mapped constellations are used at the transmitter's side and connected 
via a bit-level interleaver. At the receiver's side, reliability metrics for the coded bits (L-values) 
are calculated by the demapper, de-interleaved, and then fed to a binary decoder. This structure 
gives the designer the flexibility to choose the modulator and the encoder independently, which 
in turn allows, for example, for an easy adaptation of the transmission to the channel conditions 
(adaptive modulation and coding). This flexibility is arguably the main advantage of BICM over 
other CM schemes, and also the reason of why it is used in almost all of the current wireless 
communications standards, e.g., HSPA, IEEE 802.1 la/g, IEEE 802.16, and DVB-S2 Ch. 1]. 

Bit-interleaving before modulation was introduced in Zehavi's original paper H| on BICM. 
Bit-interleaving is indeed crucial in fading channels since it guarantees that consecutive coded 
bits to be sent over symbols affected by independent fades. This results in an increase (compared 
to TCM) of the so-called code diversity (the suitable performance measure in fading channels), 
and therefore, BICM is the preferred alternative for CM in fading channels. BICM can also be 
used in nonfading channels. However, in this scenario, and compared with TCM, BICM gives a 
smaller minimum Euclidean distance (the proper performance metric in nonfading channels), and 
also a smaller constraint capacity 0. If a Gray labeling is used, the capacity loss is small, and 
therefore, BICM is still considered valid option for CM over nonfading channels. However, the 
decrease in minimum Euclidean distance makes BICM less appealing than TCM in nonfading 
channels. 

The use of a bit-level interleaver in nonfading channels has been inherited from the original 
works on BICM by Zehavi [4] and Caire et al. 0. It simplifies the performance analysis of 
BICM and is implicitly considered mandatory in the literature. However, the reasons for its 
presence are seldom discussed. 

Previously, we have shown in how — by using multiple interleavers — the performance 
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of BICM can be improved in nonfading channels. Recently, however, it has been shown in 
(HI that in nonfading channels, considerably larger gains (a few decibels) can be obtained if 
the interleaver is completely removed from the tranceiver's configurations. In other words, it 
was shown that in nonfading channels BICM without an interleaver performs better than the 
conventional configurations of flU, Q. The results presented in |j8l are solely numerical and an 
explanation behind such an improvement is not given. In particular, [8] does not explain why the 
obtained gains depend on the constraint length of the convolutional code (CC). Nevertheless, in 
(HI some intuitive explanations (using the notion of unequal error protection) and a bit labeling 
optimization are presented. 

In this paper, we present a formal study of BICM with trivial interleavers (BICM-T) in 
nonfading channels, i.e., the BICM system introduced in [81 where no interleaving is performed. 
We recognize BICM-T as the combination of a TCM transmitter and a BICM receiver and we 
develop analytical bounds that give a formal explanation of why BICM-T with CCs performs 
well in nonfading channels. We also introduce a new type of distance spectrum for the CCs which 
allows us to analytically corroborate the results presented in These gains are shown to appear 
even for one of the simplest configuration one could think of, i.e., when the constraint length 
K = 3 convolutional code with generators (5, 7) is used together with 4-ary pulse amplitude 
modulation (PAM). Asymptotic bounds are also developed and used to show that for the (5, 7) 
code and 4-PAM, an asymptotic gain of 2.55 dB is obtained compared to an uncoded system 
with the same spectral efficiency. Motivated by the fact that this gain is the same obtained by 
Ungeroboeck's one-dimensional TCM (1D-TCM), we search and tabulate optimum convolutional 
codes for BICM-T. We show that a properly design BICM system without interleaving performs 
asymptotically as well as 1D-TCM, and therefore, BICM-T should be considered as a good 
alternative for CM in nonfading channels. The main contribution if this paper is to present an 
analytical model for BICM-T which is used to explain the results presented in [8] and also to 
design a BICM-T system in nonfading channels. 

II. System Model and Preliminaries 

Throughout this paper, we use boldface letters c t = [ci >t , ■ ■ ■ , CL,t] to denote lenght-L row 
vectors and capital boldface letters C = [cj, . . . , c^] to denote matrices, where (-) T denotes 
transposition. We use dn(C) to denote the total Hamming weight of the matrix C. We denote 
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probability by Pr(-) and the probability density function (pdf) of a random variable A by p A (A). 
The convolution between two pdfs is denoted by p Al (A) *Pa 2 (A) and {Pa(A)}*™ denotes the w- 
fold self-convolution of the pdf p A (A). A Gaussian distribution with mean value ji and variance 
a 2 is denoted by J\f(n,a 2 ), the Gaussian function with the same parameters by %jj(X;n,a) = 
-^=-exp(— ^ x ~^ ), and the Q-function by Q{x) = ^= exp ( — if) me polynomial 

generators of the convolutional codes (CC) are given in octal notation. 



A. System Model 

The BICM system model under consideration is presented in Fig. [TJ We use a constraint length 
K rate R = \ convolutional encoder connected to a 16-ary quadrature amplitude modulation (16- 
QAM) labeled by the binary reflected Gray code (BRGC) 10. This configuration is indeed very 
simple yet practical yielding a spectral efficiency of two bits per real channel use. This example 
is not restrictive, of course, yet simplifies the presentation of the main ideas. The generalization 
to other modulations and coding rate is naturally possible but would obviously increase the 
complexity of notation potentially hindering the main concepts of the analysis presented in this 
paper. 

The input sequence i = [ii, . . . is fed to the encoder (ENC) which at each time instant 
t = 1, . . . , N generates two coded bits c t = [c 1>u c 2)t ]. We use the matrix C = [cj, . . . , c^] 
of size 2 x N to represent the transmitted codeword. These coded bits are interleaved by II, 
where the different interleaving alternatives will be discussed in detail in Sec. III-BI The coded 
and interleaved bits are then mapped to a 16-QAM symbol, where the 16-QAM constellation is 
formed by the direct product of two 4-ary pulse amplitude modulation (4-PAM) constellations 
labeled by the BRGC. Therefore, we analyze the real part of the constellation only, i.e., one 
of the constituent 4-PAM constellations. The mapper is defined as $ : {[11], [10], [00], [01]} -»■ 
{— 3A, —A, A, 3A}, where we define 

so that the PAM constellation normalized to unit average symbol energy, i.e., E s = 1. 

A quick inspection of the BRGC for 4-PAM reveals that the BRGC offers unequal error 
protection (UEP) to the transmitted bits depending on their position. In particular, the bit at 
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the first position (k = 1) receives higher protectiorQ than the bit at the second position k = 2. 
More details about this can be found in [7]. Moreover, for k = 2 a bit labeled by zero (inner 
constellation points) will receive a lower protection than a bit labeled by one transmitted in 
the same bit position (outer constellation points), and therefore, the binary-input soft-output 
(BISO) channel for k = 2 is nonsymmetric. To simplify the analysis, we "symmetrize" the 
channel by randomly inverting the bits before mapping them to the 4-PAM symbol, i.e., C = 
n(C) © S, where © represents modulo-2 element-wise addition and the elements of the matrix 
S = [sj, . . . , sjj] E {0, l} 2xN where s t = [s 1)t , s 2) t] are randomly generated vectors of bits. Such 
a scrambling symmetrizes the BISO channel but it does not eliminate the UEP. We note that the 
scrambling is introduced only to simplify the analysis, and therefore, it is not shown in Fig. Q] 
nor used in the simulations. This symmetrization was in fact proposed in Q, and as we will see 
in Sec. [TV] the bounds developed based on this symmetrization perfectly match the numerical 
simulations. 

At each time t — 1, . . . , N, the coded and scrambled bits c t are mapped to a symbol x t , where 
x t = $(ct) E X and X is the 4-PAM constellation. The symbols x t are sent over an additive 
white Gaussian noise (AWGN) channel so the received signal is given by y t = x t + z t , where 
z t is a zero-mean Gaussian noise with variance N /2. The signal-to-noise ratio is defined as 
7 = E s /N = 1/N . At the receiver's side, reliability metrics for the bits are calculated by the 
demapper $ _1 in the form of logarithmic-likelihood ratios (L-values) as 

r , Pr(c M = l\y t ) 

h,t = log —pr 2 7^^- (2) 

Pr (c M = 0\y t ) 

Since c k j = c ktt ®s k j, it can be shown that l ktt = (— l) Sfet / fct , i.e., after "descrambling", the sign 
of the L-values is changed using (— l) Sk - t . These L-values are deinterleaved and then passed to 
the decoder which calculates an estimate of the information sequence i. 

B. The interleaver 

Throughout this paper, three interleaving alternatives will be analyzed, cf. the block II in Fig.CQ 
The first interleaving alternative is BICM with a single interleaver (BICM-S) introduced in [51. 

'The "protection" may be defined in different ways, where probably the simplest one is the bit error probability per bit position 
at the demapper 's output. 
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It is the most commonly used in the literature and corresponds to an interleaver that randomly 
permutes the bits C prior to modulation, where the permutation is random in two "dimensions," 
i.e., it permutes the bits over the bit positions and over time. The second alternative is BICM 
with multiple interleavers (M-interleavers, BICM-M) where the interleaver permutes the bits 
randomly only over time (and not over the bit positions). This can be seen as a particularization 
of the interleaver of BICM-S following an additional constraint: bits from the kth encoder's 
output must be assigned to the kth modulator's input. BICM-M was formally analyzed in 
and in fact corresponds to the original model introduced by Zehavi in (BICM) and Li in ifTOl 
(BICM with iterative decoding, BICM-ID). Recently, M-interleavers have also been proven to 
be asymptotically optimum for BICM-ID 0TJ. The last interleaving alternative, on which this 
paper focuses, is BICM with a trivial interleaver (BICM-T), i.e., when the interleaver II in Fig. \T\ 
is simply not present [8]. 

When BICM-T is considered, the resulting system is the one shown in Fig. [2l A careful 
examination of Fig. [2] reveals that the structure of the transmitter of BICM-T is the same as the 
transmitter of Ungerboeck's one-dimensional TCM [21 or the TCM transmitter in |[T2l Fig. 4.17]. 
The transmitter of BICM-T can also be considered a particular case of the so-caled "general 
TCM" HI Fig. 18.11] when k = k (using the notation of Q31) and when the BRGC is used 
instead of Ungerboeck's set-partitioning. The receiver of BICM-T in Fig. [2] corresponds to a 
conventional BICM receiver, where L-values for each bit are computed and fed to a soft-input 
Viterbi decoder (VD). The difference between this receiver's structure and a TCM receiver is 
that bit-level processing is used instead of a symbol-by-symbol VD. In conclusion, the BICM-T 
system introduced in [8] is simply a BRGC -based TCM transmitter used in conjunction with a 
BICM receiver. Nevertheless, through this paper, we use the name BICM-T to reflect the fact 
that this transmitter/receiver structure can be considered as a particular case of the BICM system 
in Hi, 0, where the interleaver takes a trivial form. 
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C. The Decoder 

A maximum likelihood sequence decoder (e.g., the VD) chooses the most likely coded se- 
quence C using the vector of channel observations y = . . . ,y N ] as 



C = max{log(Pr{C|y})} (3) 
= max\\og(l[Pr{c t \y t })], < 4 > 



where V is the set of all codewords, where to pass from © to © we used the memoryless 
property of the channel. If we assume that the bits [ci jt , C2,t] are independent, we obtain 

log ( ijMctiy*} ) = log ( n n pr { c ^iy> ) • ® 

\t=i j \k=i t=i j 

Under this independence assumption and by using the relation between an L-value / and the 
bit's probabilities of being b E {0, 1} 



Pr{%} = (6) 

we obtain 

(2 N \ 2 N 

niI Pr K*|y} =^^log(Pr{c M |y}) 
k=l t=l J k=l t=l 

2 N 2 N 

= J2Y1 lo s( x + e Mh, t ))- (7) 
k=i t=i k=i t=\ 

Since the second term in © is independent of C, it is irrelevant to the decision of the decoder 
in ©. Therefore, the final decision of the decoder can be written as 

2 N \ 

fc=i t=i j 

In a BICM system with convolutional codes, the decoder is implemented using an off-the-shelf 
soft-input VD, which assumes that the bits are independent, and thus, uses the relation in © 
(i.e., it uses the decision rule in ([8])). The relation in © is in indeed valid when BICM-S © 
or BICM-M 01, 0, 0T1 configurations are used, since in those cases, the use of a random 
interleaver (cf. Sec. III-B|) assure that the bits [ci )t , c 2 ,t] are transmitted in different symbols, and 
therefore, are affected by different noise realizations. 

However, when BICM-T with a soft-input VD is considered, and since the bits [ci it , c 2 t ] are 
affected by the same noise realization, the relation in © does not hold, i.e., the two L-values 



max 
ce£> 
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passed to the decoder at any time instant t are not independent. Nevertheless, the decoder treats 
the bits as independent and still uses the decision rule in ([8]). In principle, it would be possible 
to design a decoder for BICM-T that takes into account this inconsistency, i.e., a decoder that 
does not assume independent bits. However, this is out of the scope of this paper and would also 
go against the flexibility offered by BICM. Moreover, we will show in the following section that 
even with this inconsistency, BICM-T in nonfading channels outperforms BICM-S and BICM-M. 



A. BER Performance 

Because of the symmetrization of the channel, we can, without loss of generality, assume that 
the all-zero codeword was transmitted. We define £ as the set of codewords corresponding to 
paths in the trellis of the code diverging from the zero-state at the arbitrarily chosen instant t = t , 
and remerging with it after T trellis stages. We also denote these codewords as E = [ej, . . . , e£], 
where e t = [ei )t , e 2)t ]. Then, the bit error rate (BER) can be upper-bounded using a union bound 
(UB) as 



where g? h (*e) 1S me Hamming weight of the input sequence «e corresponding to the codeword 
E, and the pairwise error probability (PEP) is given by (cf. ©) 



The general expression for the PEP in (flOl) and the UB in © reduce to well-known particular 
cases if simplifying assumptions for the distribution of l^t are adopted. 

1) Independent and identically distributed L-values (BICM-S): In BICM-S, the L- values lk,t 
passed to the decoder are independent and identically distributed (i.i.d.). They can be described 
using the conditional pdf p(A|6) with b 6 {0, 1} and where the pdf is independent of k and t. 
In this case, the PEP in (flOl) depends only on the Hamming weight of the codeword E, i.e., 



III. Performance Evaluation 




(9) 




(10) 



PEP(E) =PEP s (d H (E)) 




(ID 
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The UB in © can be expressed as 

UB S = £>EP S H d ^ < 12) 
= ^PEPsH/3£, (13) 

w 

where V w represents the set of codewords with Hamming weight w, i.e., V w = {E G £ : 
c?h(E) = w}. To pass from (fT2l) to (fl"3l) we group the codewords E that have the same Hamming 
weight and add their contributions, which results in the well-known weight distribution spectrum 
of the code (3^. The expression in (fl"3l is the most common expression for the UB for BICM, 
cf. m eq. (26)], [H eq. (4.12)]. 

2) Independent but not identically distributed L-values (BICM-M): In BICM-M, the L- values 
passed to the each decoder's input are independent, however, their conditional pdf depends 
on the bit's position k = 1,2. Thus, the L-values are modeled by the set of conditional pdfs 
{Pi(A|fe),p 2 (A|&)}. The PEP in this case is given by 

PEP(E) =PEP M (™ E)1 ,W E , 2 ) 

POO 

= / { Pl (A|6i = 0)}* We i * {p 2 (A|6 2 = 0)}*^ 2 dA, (14) 
Jo 

where MJE,fc is the Hamming weight of the A;th row of E. The UB in ® can be expressed as 

UB M = ^ VEP M (wi,w 2 ) ^ ^h(*e) 

wi,w 2 Eec u , 1 , m2 

= ^PEPmK,^)/?^^, (15) 

W1,W2 

where T> Wl:W2 is the set of codewords with generalized Hamming weight [wi, W2] (wk in its kth 
row), i.e., T> Wl)W2 = {E e S : wi = f El i,iU2 = w E , 2 }, and 0^ um is the generalized weight 
distribution spectrum of the code that takes into account the errors at each encoder's output 
separately. The UB in (IT3T) was shown in to be useful when analyzing the UEP introduced 
by the binary labeling and also to optimize the interleaver and the code. 

3) BICM without bit-interleaving (BICM-T): For BICM-T, yet a different particularization of 
(flOl) must be adopted. Let A E be the metric associated to the codeword E and assume without 
loss of generality that t = t. This metric is a sum of independent random variables, i.e., 

A E = At + Am + Am + • • • , (16) 
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where A t = e 1>t li it + e 2 ,th,t corresponds to the elements defining the PEP in (flOl) . We then 
express the tth metric as 



A t {e u s t/ 



0. 



if e t 
if e t 
if e t 



[0,0] 
[1,0] 
[0,1] 



(17) 



£Li(-l) SM ^ if c* = [1, 1] 

where we use A t (e t , s t ) to show that A t depends on the scrambling's outcome s t (through 4 j4 ) 
and the error pattern at time t, e t . 

Since l^t arc random variables (that depend on k and x t ), according to (fTTI) . there exist 
three pdfs that can be used to model the individual metrics in (fT6l) . We denote the set of these 
three conditional pdfs by {p-^AI&i), p 2 (A|&2), Ps(A|&)}, f° r me three relevant cases defined in 
(fTTT) . respectively. We note that p s (A|b) is conditioned not only on one bit, but on the pair of 
transmitted bits b = [61,62], where 61 , 6 2 , and b represent the bits c\ >t , C2,t, and c t , respectively. 
From (TT6l) . and due to the independence of the individual metrics, the PEP in (flOl) can be 
expressed as 

PEP(E) = PEPt(we,i,we,2,w e ,e) 

POO 

= / {p 1 (A|6 1 = 0)n- 1 *{p 2 (A|6 2 = 0)n^*{p s (A|6=[0,0])r^dA, (18) 

where jije^ is the number of columns in E where only the A;th row of E is one, and we,s is the 
number columns in E where both entries are equal to one. Clearly 

d u (E) = w E>1 + w E ,2 + 2u7 E ,e- (19) 

Example 1 (Error event at minimum Hamming distance of (5, 7) code): Consider the constraint 
length K = 3 optimum distance spectrum convolutional code (ODSCC) with polynomial gener- 
ators (5, 7) lfl4l Table I]. The free distance of the code is c^ ee = 5, and 0% = 1, i.e., there is one 
divergent path at Hamming distance five from the all-zero codeword, and the Hamming weight 
of that path is c/ h (*e) = 1- Moreover, it is possible to show that this codeword is 



E 



1 1 
1 1 1 
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i.e., d H (E) = 5, w E ,i = 0, w E ,2 = 1> and w EiE = 2. Also, W E ,i = 2 and wT Ei2 = 3. 

We define T> m>W2iWs as the set of codewords E with w\ columns such that e t = [1,0], u>2 
columns with e t = [0, 1], and columns with e t = [1, 1], i.e., V Wl!W2tWs = {E G ^ : wi = 
we,i,W2 = we,2,ws = w Ei e}. Using this, the UB expression in © for BICM-T is given by 

UB T = ^ PEPtOi, ^2, w s ) ^ ^h(*e) 

= ^ FEF T ( Wl ,w 2 ,Wx){3 c WuW2 ^, (20) 

M)l,Ul 2 ,M>E 

where W2jWs is a weight distribution spectrum of the code C that not only considers the 
generalized weight [101,102] of the codewords, but takes into account the temporal behavior, 
i.e., it considers the case when e t = [1, 1] as a different kind of event. This differs from /3£ w , 
where such an event will be simply considered as an extra contribution to the total generalized 
weight. 

B. PDF of the L-values 

In order to calculate the PEP for BICM-T in (fT8l) we need the compute the set of conditional 
pdfs {p 1 (A|6i), p 2 (A]&2), Ps(A|6)}. In this subsection we show how to find approximations for 
these PDFs. 

The L-values in © can be expressed as 

~ EsE^PM .... 

k.t = log ^ 7 — r~r, (21) 

where X k h is the set of constellation symbols labeled with b at bit position k. Using the fact that 
the channel is Gaussian and if the so-called max-log approximation log(e a + e b ) w max{a, b} 
is used, the L-values can be expressed as 



ik,t(yt\ s t) ~ 7 



min [%) t — x) 2 — min (y t — x) 2 



(22) 



where from now on we use the notation lk.t(yt\ s t) to emphasize that the L-values depend on the 
received signal and the scrambler's outcome s t . In fact, the L-values depend on the transmitted 
symbol x t , however, and since c t = and no interleaving is performed, x t is completely 
determined by s t . 

The L- value in (1221) is a piece- wise linear function of y t . Moreover, the L-values A t (e t , s t ) in 
(fTTT) are linear combinations of h,t(yt\st) in ([22]) . and therefore, they are also piece- wise linear 

December 9, 2010 DRAFT 



12 



functions of y t . Two cases are of particular interest, namely, when e t = [1,0] or e t = [0,1], 
and when e t = [1, 1]. The piece-wise linear relationships for first case are shown in in Fig. [3] a) 
for 4-PAM. In this figure we also show the constellation symbols and we use the notation 
s t = [0/1, :] and s t = [:, 0/1] to show that for e t = [1, 0] and e t = [0, 1] the L-values A t (e t , s t ) 
are independent of s 2 ,t and Si jt , respectively. In Fig.[3]b), the four possible cases when e t = [1, 1] 
are shown. 

For a given transmitted symbol x t (determined by s t ), the received signal y t is a Gaussian 
random variable with mean x t and variance N /2. Therefore, each L-value A t (e t , s t ) in (fT7l) is a 
sum of piece-wise Gaussian functions^. In order to obtain expressions that are easy to work with, 
we use the so-called zero-crossing approximation of the L-values proposed in lfT51 Sec. III-C] 
which replaces all the Gaussian pieces required in the max-log model of L-values by a single 
Gaussian function. Intuitively, this approximation states that 

A t (y t \e t , s t ) « a(e t , s t )y t + b(e t , s t ), (23) 

where a(e t , s t ) and b(e t , s t ) are the slope and the free coefficient of the closest linear piece to 
the transmitted symbol x t . 

In Table U we show the values of a(e 4 , s t ) and b(e t , s t ) defining (T23T) for 4-PAM, where for 
notation simplicity we have defined 

a = 4 7 A 2 . (24) 

To clarify how these coefficients are obtained, consider for example e t = [0, 1]. In this case, for 
s t = [1,1], which corresponds to x t = —3 A, the closest linear piece intersecting the x-axis is 
the left-most part of the curve labeled in Fig.[3]by e t = [0, 1] and s t = [:, 1] (dashed-dotted line). 
If for example e t = [0, 1] and s t = [0, 0] (x t = A), the closest linear piece is the right-most 
piece labeled by e t = [0, 1] and s t = [:, 0] (dashed line). All the other values in Table U can be 
found by a similar direct inspection of Fig. [3] 

Using the approximation in (1231) . the L-values can be modeled as Gaussian random variables 
where their mean and variance depend on s t , 7, and e t , i.e., 

p At (A|ef, s t ) = ifj(\;fi(e tj s t ),a 2 (e t ,s t )), (25) 

2 Closed-form expressions for these pdfs of A t (e t ,s t ) when e t = [1,0] and e t = [1,0] (cf. Fig.[5]a) were presented in 1151 . 
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where the mean value and variance are given by 

K e t, s t) = x t a(e t , s t ) 



b(e t , s t 



a 2 (e u s t ) = [a{e u s t )f^-. 



(26) 
(27) 



In Table HI we show the obtained mean values and variances for the same cases presented in 
Table E 

To obtain the pdf of A t in (fTTT) . we simply average (1251) over the symbols, which are assumed 
to be equiprobable. This results in the following expression 

\ |V>(A;-3a,2a) + V(A;-a,2a)] , if e t = [1,0] 
Pa 4 ( a ) = < i>(\;-a,2a), if e t = [0, 1] ■ (28) 

^(A;-4a,8a), if e t = [1, 1] 

IV. Discussion and Applications 

In the previous section, we developed approximations for the pdf of the L-values passed to the 
decoder in BICM-T In this section we use them to quantify the gains offered by BICM-T over 
BICM-S, to define asymptotically optimum CCs, and to compare BICM-T with Ungerboeck's 
1D-TCM. 



A. Performance of BICM-T 

Expression (|28T) show the pdf of the L-values needed to compute the UB of BICM-T, cf. (fl"8l 
and (|2Q|) . Moreover, due to the simplifications introduced in the previous subsections the results 
in (|28T) only involve Gaussian pdfs, which greatly simplifies the PEP computation in (fl"8~T) . 

Theorem 1: The UB for BICM-T is 

Proof: Inserting (|28T) in (fl"8~l) . changing the convolution of sums into a sum of convolutions, 
and using ^(A; /ii, erf) * ... * ^(A; /ij, aj) = ip(X; Y^j=i Mi> J2j=i °f)-> tne PEP in can be 
expressed as 

FEP T (w u w 2 ,wx) = (0 £^. 1 ^(A;//i A Ej,ff? AS )dA, (30) 
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where 

£*i,2,s,j = -(wi + w 2 + 4w E + 2j)a (31) 

°"i,2,e = 2(^1 + w 2 + 4w s )<x (32) 

By using the definition of a in §Mb and A in ®, and CD} and 432} in (|3~0)) . and the UB definition 
in (1201 . the expression in (|29| ) is obtained. ■ 

In Fig. SI numerical results for BICM-T with 4-PAM labeled with the BRGC and using 
the ODSCCs (5, 7) (K = 3) and (247, 371) (K = 8) d Table I] are shown. For BICM- 
M two configurations are considered for each code. The first one is when all the bits from 
the first encoder's output are assigned to the first modulator's input and all the bits from the 
second encoder's output are sent to the second modulator's input. The second alternative simply 
corresponds to the opposite, i.e., all the bits from the first encoder's output are sent over k = 2 
and the bits from the second encoder's output are sent over k = 1. This is equivalent to defining 
the code by simply swapping the order of the polynomial generators. For these two particular 
codes, the configuration that minimizes the BER for medium to high SNR is the second one, 
i.e., when all the bits generated by the polynomial (7) or (371) are sent over k = 1 and all the 
bits generated by the polynomial (5) or (247) are sent over k = 2. We denote the configuration 
that minimizes (or maximizes) the BER by "Best" (or "Worst"). 

To compute the UB for BICM-S and BICM-M, we use the expressions in [0 eq. (22)-(23)], 
and for BICM-T we use Theorem [IJ All the UB computations were carried out considering a 
truncated spectrum of the code, i.e., {w, Wi, w 2 , w^} < 30 which is calculated numerically using 
a breadth first search algorithm |fT6l . The results in Fig. |4] show that the UB developed in this 
paper for BICM-T predict well the simulation results. Also, these results show that for these 
particular codes, the gains obtained by using BICM-M instead of BICM-S are small, although 
larger gains were obtained in [7] for other codes/configurations. On the other hand, the gains by 
using BICM-T instead of BICM-S for a BER target of 1(T 7 are approximately 2 dB for K = 3 
and 1 dB for K = 8. Moreover, these gains are obtained by decreasing the complexity of the 
system, i.e., by not doing interleaving/de-interleaving. 
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B. Asymptotic Performance 

In this subsection, we analyze the performance of BICM-T for asymptotically high SNR and 
we compare it with BICM-S. 

Theorem 2: The asymptotic UB for BICM-T and a given code C can be expressed as 



UB' T = M c Qy^j, (33) 

where 

A c = min (w x + w 2 + 4w E ) (34) 



Wl,U)2,W- S 

3 C ^0 

J w-^ ,ii>2 > W J2 



m c = y b c (-) 



(35) 



7^0 



Proof: The UB in (|29l ) is a sum of weighted Q-functions. For high SNR, and for each 
(u>i, W2, wy) there is a Q-function that dominates the the inner sum in (l29l) . This is obtained for 
j = 0, which completes the proof. ■ 
For comparison purposes, we present here the performance of BICM-S at asymptotically high 
SNR. This can be obtained for example by particularizing [[7] eq. (25)] to the conventional BICM 
configuration with one single interleaver. The asymptotic performance of BICM-S is given by 

ub '^ (i)^ r ^ Q (\fW) • <36) 

where d f ^ e is the free Hamming distance of the code which can be expressed as G^ ee = u;f ee + 

?4 ee + 2w| ee , cf. CGI. 

In Fig. HI we show asymptotic UBs for K = 3. For BICM-T we used Theorem [2l for BICM- 
S we use (l36l) . and for BICM-M we use [[71 eq. (25)]. All of them are shown to follow the 
simulation results quite well. Similar results can be obtained for the code with K = 8, however, 
we do not show those results not to overcrowd the figure. 

The asymptotic gain (AG) obtained by using BICM-T instead of BICM-S is obtained directly 
from Theorem [2] and (|36"I) . as stated in the following corollary. 

Corollary 3: The AG obtained by using BICM-T instead of BICM-S 

AGs^T = 10 l Ogl0 (^^e + ^ee) ■ (3V) 
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Example 2 (AG for the (5, 7) code): For the particular code (5, 7), it is possible to see that 



the solution of (|34l) corresponds to the event at minimum Hamming distance^, i.e., d^ ee = 5, 
we,i = 0, u;e,2 = 1> ^e,s = 2 (cf. Example [B, and therefore, A c = 9. This result in an AG 
of 101og 10 (|) « 2.55 dB. Moreover, since the input sequence that generates the codeword at 
minimum Hamming distance has Hamming weight one (J3q 12 — 1)> we obtain Mq 12 = 1 for 
the configuration "Worst". If the polynomials are swapped (which corresponds to swapping the 
rows of E), i.e., if we consider the code (7,5), we obtain we,i = 1, u^e,2 = 0, we,s = 2 and 
the same A c (since A c does not depend on the order of the polynomials). However, in this case 
Mi 02 = 1/2- These two asymptotic bounds are shown in Fig. @] where the influence of the 
coefficient M c can be observed in both numerical results and asymptotic bounds. 



C Asymptotically Optimum Convolutional Codes 

Optimum CCs are usually defined in terms of minimum distance, i.e., good CCs are the one that 
for a given rate and constraint length have the maximum free distance (MFD) 1171 Sec. 8.2.5]. 
The MFD criterion can be refined if the multiplicities associated to the different weights are 
considered lfi"4ll . lfT3l Sec. 12.3]. This optimality criterion resulted in the ODSCCs which are 
optimal in both binary transmission and in BICM-S, cf. (l36l) . for BICM-M, we have shown in 
that d^ e is still a good indicator of the optimality of the code (as well as its multiplicity), 
however, a generalized weight distribution spectrum of the code should be considered, cf. w 
in (ff5|). If BICM-T is considered, and as a direct consequence of Theorem [2l asymptotically 
optimum convolutional codes (AOCCs) can be defined. 

Definition 1 (Asymptotically optimum convolutional codes for BICM-T): A CC is said to be 
an AOCC if among all codes with the same K and R — 1/2 it has the highest A c , and among 
all codes with the same K and R= 1/2 it has the lowest multiplicity M c . 

We have performed an exhaustive numerical search for AOCCs based on Definition [Q We 
considered for constraint lengths K — 3, 4, . . . , 8 and all codes with free distance < c^f 6 < <ijf e , 
where rf^ ee is the free distance of the ODSCC. The spectrum was truncated as Wi + w 2 + 4w; s < 
d f ft e + 8 and the search was performed in lexicographic order. The results are shown in Table [TTTl 
where we also include the ODSCCs for comparison. If there exist more than one AOCC for a 

3 However, this is not always true for other codes. 
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given K, we present the first one in the list. These results show that in general the minimum free 
distance of the code is not the proper criterion in BICM-T, i.e., codes that are not MFD codes 
perform better than the ODSCCs, cf. K = 6,7. In fact, only for K = 3 the ODSCC is also 
optimum for BICM-Tq In this table we also present the AG that BICM-T offers with respect 
to BICM-S. The values obtained are around 2 dB. 

In Fig. [5l we show the values of A c and M c for all the possible codes with K = 5. This 
figure shows that for K = 5 the AOCC and the ODSCCs have the same asymptotic performance 
(same A c ), however, the multiplicity of the AOCC is smaller. In Fig. [61 we show similar results 
for K = 6, where we only show a subset of all the possible codes. This figure shows that 
for K = 6, the ODSCC gives a worse (smaller A c ) asymptotic performance compared to the 
AOCC. Moreover, the AOCC code in this case has = 7 while the ODSCC has 4T = 8 > 
cf. Table |nij The same phenomenon occurs for K = 7. 

D. BICM-T vs. TCM 

As mentioned in Sec. III-Bl the transmitter of BICM-T is identical to the transmitter of 
Ungerboeck's 1D-TCM. In this subsection we compare their asymptotic performance. 

We have previously defined in <[37j the AG of BICM-T over BICM-S. It is also possible to 
define the AG of BICM-T compared to uncoded transmission with the same spectral efficiency 
(uncoded 2-PAM). Since the minimum squared Euclidean distance of the 2-PAM constellation 
is 4, the AG is given by 

AG D c_T = 101og 10 f4-V ( 3g ) 



5 J 

The AG in (1381) is tabulated in the last column of Table Hill For K = 3, AGuc^t is equal 
to 2.55 dB, which is the same as AGs^t- This is because BICM-S with K = 3 does not offer 
any AG compared to uncoded 2-PAM. Analyzing the results in the last column of Table [Till we 
find that they are the same ones obtained by 1D-TCM, cf. [2, Table I]. This simply states that 
if BICM-T is used with the correct CC, it performs asymptotically as well as 1D-TCM, and 
therefore, it should be considered as good alternative for CM in nonfading channels. However, 
this is not the case if BICM-S is used, or if BICM-T is used with the ODSCCs. 

4 For K = 4 the AOCC (13, 17) has, in fact, the same spectrum /3£ liTO2iTOS than the ODSCC (15, 17). The AOCC appears 
in the list because of the lexicographic order search. 
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V. Conclusions 

In this paper, we gave a formal explanation of why gains can be obtained when BICM-T is 
used in nonfading channels. BICM-T was shown to be a TCM transmitter used with a BICM 
receiver. An analytical model was developed and a new type of distance spectrum for the code 
was introduced, which is the relevant characteristic to optimize CCs for BICM-T. The analytical 
model was used to validate the numerical results and to show that the use of the ODSCCs, which 
rely on the regular minimum free distance criterion, is suboptimal. 

For simplicity, the analysis presented in this paper was done only for a simple BICM config- 
uration, and therefore, it is still unknown what the performance gains will be in a more general 
setup, e.g., when the number of encoder's outputs is not the same as the modulator's input, for 
different spectral efficiencies, or when a less trivial (but still not infinitely long and random) 
interleaver is used. All these questions are left for further investigation. 
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Fig. 1. Model of BICM transmission. 
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Fig. 2. BICM-T system analyzed in this paper for any time instant t. 
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a) Cases when e t = [1, 0] or e f = [0, 1] b) Cases when e t = [1, 1] 




Fig. 3. Piece-wise relation between the L-values At(et, s t ) in dl7t and the received signal yt for 4-PAM for all the possible 
values of et and s t . The relation for the case when et = [1, 0] or et = [0, 1] is shown in a), and the relation when e t — [1, 1] 
is shown in b). The transmitted symbols are shown with black squares. 
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Fig. 4. BER for BICM using the (5, 7) and (247, 371) ODSCCs 03) and 4-PAM labeled with the BRGC, and for BICM-S 
(5), BICM-M (7j, and BICM-T. The simulations are shown with markers and the UB with solid lines. The asymptotic UB is 
shown with dashed lines. 
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Fig. 5. Values of A c and M c for all the possible CCs with K = 5. The ODSCC and the AOCC are shown with filled markers. 
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Fig. 6. Values of A c and M c for all the possible CCs with K = 6. The ODSCC and the AOCC are shown with filled markers. 
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TABLE I 

Values of a(et, at) and b(e t ,s t ) in i|23} for 4-PAM found by direct inspection of Fig. [3] 
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[1,1] 
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[0,0] 
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a(e t , s t ) 


b(e t , s t ) 
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TABLE II 

Values of fi(et, s t ) and a 2 (e t , s t ) given in i|26]i and i|27} for 4-PAM. 
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AOCCS, ODSCCS, COEFFICIENTS A C AND M C , AND AGS. 



K 


(91,92) 


AOCCs 
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«H 


A c 


M c 


ODSCCs 

(01,32) 


.free 
«H 


AG [dB] 

AGs^t AGuc->t 
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5 


9 


0.50 
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5 
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2.55 


4 
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